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ff^ Abstract 



(T^ How much did a network change since yesterday? How different is the wiring between Bob's brain (a left- 

CN handed male) and Alice's brain (a right-handed female)? Graph similarity with known node correspondence, 

^ i.e. the detection of changes in the connectivity of graphs, arises in numerous settings. In this work, we 

formally state the axioms and desired properties of the graph similarity functions, and evaluate when state-of- 
the-art methods fail to detect crucial connectivity changes in graphs. We propose DeltaCon, a principled, 
[~^ intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g. 

T-H employees of a company, customers of a mobile carrier). Experiments on various synthetic and real graphs 

showcase the advantages of our method over existing similarity measures. Finally, we employ DeltaCon to 

HH real applications: (a) we classify people to groups of high and low creativity based on their brain connectivity 

C/j graphs, and (b) do temporal anomaly detection in the who-emails-whom Enron graph. 



1 Introduction 



. Graphs arise naturally in numerous situations; social, traffic, collaboration and computer networks, images, 

^ protein-protein interaction networks, brain connectivity graphs and web graphs are only a few examples. A 

t^^ problem that comes up often in all those settings is the following: how much do two graphs or networks differ in 

vQ terms of connectivity? 

■^ Graph similarity (or comparison) is a core task for sense-making: abnormal changes in the network traffic 

■^ may indicate a computer attack; differences of big extent in a who-calls-whom graph may reveal a national 

O celebration, or a telecommunication problem. Besides, network similarity can give insights into behavioral 

patterns: is the Facebook message graph similar to the Facebook wall-to-wall graph? Tracking changes in 
networks over time, spotting anomalies and detecting events is a research direction that has attracted much interest 

(e.g., [1], m, my 

KN Long in the purview of researchers, graph similarity is a well-studied problem and several approaches have 

5^ been proposed to solve variations of the problem. However, graph comparison still remains an open problem, 

while, with the passage of time, the list of requirements increases: the exponential growth of graphs, both in 

number and size, calls for methods that are not only accurate, but also scalable to graphs with billions of nodes. 

In this paper, we address two main questions: How to compare two networks efficiently? How to evaluate 

their similarity score? Our main contributions are the following: 

1. Axioms/Properties: we formalize the axioms and properties that a similarity measure must conform to. 

2. Algorithm: we propose DeltaCon for measuring connectivity differences between two graphs, and show 
that it is: (a.) principled, conforming to all the axioms presented in Section|2] (b) intuitive, giving similarity 
scores that agree with common sense and can be easily explained, and (c) scalable, able to handle large- 
scale graphs. 

* Computer Science Department, Carnegie Mellon University. 
^Department of Statistical Science, Duke University. 








(a) Connectome: neural network of brain, (b) Dendogram representing the hierarchical clustering of the 

DeltaCon similarities between the 1 14 connectomes. 

Figure 1: (a) Brain network (connectome). Different colors correspond to each of the 70 cortical regions, whose centers 
are depicted by vertices. Connections between the regions are shown by edges. DeltaCon is used for clustering and 
classification, (b) The connectomes are nicely classified in two big clusters by hierarchical clustering. The classification is 
based on the pairwise DeltaCon similarities between the 1 14 connectomes that we study. Elements in red correspond to 
high artistic score - thus, DeltaCon shows that artistic brains seem to have different wiring than the rest. 

3. Experiments: we report experiments on synthetic and real datasets, and compare DeltaCon to six state- 
of-the-art methods that apply to our setting. 

4. Applications: We use DeltaCon for real-world applications, such as temporal anomaly detection and 
clustering/classification. In Fig. [T] DeltaCon is used for clustering brain graphs corresponding to 114 
individuals; the two big clusters which differ in terms of connectivity correspond to people with high and 
low creativity. More details are given in Sec.|5] 

The paper is organized as follows: Section [2] presents the intuition behind our method, and the axioms and 
desired properties of a similarity measure; Sec.[3]has the proposed algorithms; experiments on synthetic and big 
real networks are in Sec.[4| Sec. [5] presents two real-world applications; the related work and the conclusions are 
in Sec.|6]and |7]respectively. Finally, Table [T]presents the major symbols we use in the paper and their definitions. 

2 Proposed Method: Intuition 

How can we find the similarity in connectivity between two graphs, or more formally how can we solve the 
following problem? 

Problem 1. DeltaC ON nectivity 

Given: (a) two graphs, Gi {V,£i) and G2 (V, £2) with the same node sevl V, and different edge sets £1 and £2, 

and (b) the node correspondence. 
Find: a similarity score, sim{Gi,G2) G [0, 1], between the input graphs. Similarity score of value means 

totally different graphs, while 1 means identical graphs. 

The obvious way to solve this problem is by measuring the overlap of their edges. Why does this often not 
work in practice? Consider the following example: according to the overlap method, the pairs of barbell graphs 
shown in Fig.[2]of p.jsl (-B10, m,i?10) and (-B10, mmBlO), have the same similarity score. But, clearly, from 
the aspect of information flow, a missing edge from a clique (mBlO) does not play as important role in the graph 
connectivity as the missing "bridge" in mmBW. So, could we instead measure the differences in the 1-step 



'If the graphs have different, but overlapping, node sets Vi and V2 , we assume that V = Vi U V2 , and the extra nodes are treated as singletons. 



Table 1: Symbols and Definitions. Bold capital letters: matrices, lowercase letters with arrows: vectors, plain font: scalars. 



Symbol 


Description 


G 


graph 


V,n 


set of nodes, number of nodes 


f ,m 


set of edges, number of edges 


sim{Gi,G2) 


similarity between graphs Gi and G2 


d{G^,G2) 


distance between graphs Gi and G2 


I 


n X n identity matrix 


A 


n X n adjacency matrix with elements Uij 


D 


n X n diagonal degree matrix, da ~ J^j ^ij 


L 


= D — A laplacian matrix 


S 


nx n matrix of final scores with elements s^ 


S' 


nx g reduced matrix of final scores 


Ci 


n X 1 unit vector with 1 in the i*-^ element 


SOfe 


n X 1 vector of seed scores for group k 


Sl 


nxl vector of final affinity scores to node i 


g 


number of groups (node partitions) 


e 


= 1/(1 + maxi [da)) positive constant (< 1) 
encoding the influence between neighbors 


DCo,DC 


DeltaConq, DeltaCon 


VEO 


Vertex/Edge Overlap 


GED 


Graph Edit Distance |4] 


ss 


Signature Similarity |5| 


A-D Adj. /Lap 

N.L. 


A-distance on A / L / 
normalized L 



away neighborhoods, 2-step away neighborhoods etc.? If yes, with what weight? It turns out (Intuition [T]) that 
our method does exactly this in a principled way. 

2.1 Fundamental Concept The first conceptual step of our proposed method is to compute the pairwise node 
affinities in the first graph, and compare them with the ones in the second graph. For notational compactness, we 
store them in a n x n similarity matrixrlS. The Sij entry of the matrix indicates the influence node i has on node 
j. For example, in a who-knows-whom network, if node i is, say, republican and if we assume homophily (i.e., 
neighbors are similar), how likely is it that node j is also republican? Intuitively, node i has more influence/affinity 
to node j if there are many, short, heavily weighted paths from node ito j. 

The second conceptual step is to measure the differences in the corresponding node affinity scores of the two 
graphs and report the result as their similarity score. 

2.2 How to measure node affinity? Pagerank, personalized Random Walks with Restarts (RWR), lazy RWR, 
and the "electrical network analogy" technique are only a few of the methods that compute node affinities. We 
could have used Personalized RWR: [I — (1 — c)AD^^]sj = c ej, where c is the probability of restarting the 
random walk from the initial node, ei the starting (seed) indicator vector (all zeros except 1 at position i), and 
Si the unknown Personalized Pagerank column vector. Specifically, Sij is the affinity of node j w.r.t. node 
i. For reasons that we explain next, we chose to use a more recent and principled method, the so-called Fast 
Belief Propagation (FaBP), which is identical to Personalized RWR under specific conditions (see Theorem[2]in 



Appendix A.2 of Ii6j|). We use a simplified form of it (see Appendix A.l in [6J) given by: 



(2.1) [I + e^D - eA]s; = e; 

where Sj = [sn, ...Sj„]^ is the column vector of final similarity /influence scores starting from the f*^ node, e is 
a small constant capturing the influence between neighboring nodes, I is the identity matrix, A is the adjacency 
matrix and D is the diagonal matrix with the degree of node i as the da entry. 

An equivalent, more compact notation, is to use a matrix form, and to stack all the s, vectors (i = 1, . . . , n) 
into the n x n matrix S. We can easily prove that 



(2.2) 



[s^J] = [I + e^-D-eAY 



2.3 Why use Belief Propagation? The reasons we choose B P and its fast approximation with Eq. ( 2.2 ) are: (a) 
it is based on sound theoretical background (maximum likelihood estimation on marginals), (b) it is fast (linear 
on the number of edges), and (c) it agrees with intuition, taking into account not only direct neighbors, but also 
2-, 3- and fc-step-away neighbors, with decreasing weight. We elaborate on the last reason, next: 

Intuition 1 . [Attenuating Neighboring Influence] 

By temporarily ignoring the term e^D in {2.2 \, we can expand the matrix inversion and approximate the n x n 



matrix of pairwise affinities, S, as 

S f« [I - eA]-^ ?« I + eA + e^A^ + . . . . 

As we said, our method captures the differences in the 1-step, 2-step, 3-step etc. neighborhoods in a weighted 
way; differences in long paths have smaller effect on the computation of the similarity measure than differences 
in short paths. Recall that e < 1, and that A*^ has information about the fc-step paths. Notice that this is just the 
intuition behind our method; we do not use this simplified formula to find matrix S. 



Unreality, we don't measure all the affinities (see Section 3.2 for an efficient approximation) 



2.4 Which properties should a similarity measure satisfy? Let Gi (V, £i) and G2(V, £2) be two graphs, and 
sim{Gi, G2) G [0, 1] denote their similarity score. Then, we want the similarity measure to obey the following 
axioms: 

Al. Identity property: sim{Gi,Gi) = 1 
A2. Symmetric property: sim{Gi,G2) = sim{G2,Gi) 

A3. Zero property: sim{Gi, G2) — )• for re — )• 00, where Gi is the clique graph {Kn), and G2 is the empty 
graph (i.e., the edge sets are complementary). 

Moreover, the measure must be: 

(a) intuitive. It should satisfy the following desired properties: 

PI. {Edge Importance] Changes that create disconnected components should be penalized more than changes 

that maintain the connectivity properties of the graphs. 
P2. [Weight Awareness] In weighted graphs, the bigger the weight of the removed edge is, the greater the 

impact on the similarity measure should be. 
P3. [Edge-"Submodularity"] A specific change is more important in a graph with few edges than in a much 

denser, but equally sized graph. 
P4. [Focus Awareness] Random changes in graphs are less important than targeted changes of the same extent. 

(b) scalable. The huge size of the generated graphs, as well as their abundance require a similarity measure 
that is computed fast and handles graphs with billions of nodes. 

3 Proposed Method: Details 

Now that we have described the high level ideas behind our method, we move on to the details. 

3.1 Algorithm Description Let the graphs we compare be Gi{V,£i) and G2{V,£2)- If the graphs have 
different node sets, say Vi and V2, we assume that V = Vi U V2, where some nodes are disconnected. 

As mentioned before, the main idea behind our proposed similarity algorithm is to compare the node affinities 
in the given graphs. The steps of our similarity method are: 

Step 1. By eq. ( |2.2| ), we compute for each graph the re x re matrix of pairwise node affinity scores (Si and 
S2 for graphs Gi and G2 respectively). 

Step 2. Among the various distance and similarity measures (e.g., Euclidean distance (ED), cosine similarity, 
correlation) found in the literature, we use the root euclidean distance (ROOTED, a.k.a. Matusita distance) 



(3.3) (i = ROOTED (Si, S 



l,^2j 



\ 



n n 

i = l i = l 



We use the ROOTED distance for the following reasons: 

1. it is very similar to the Euclidean distance (ED), the only difference being the square root of the pairwise 
similarities (sjj), 

2. it usually gives better results, because it "boosts" the node affinitieqj and, therefore, detects even small 
changes in the graphs (other distance measures, including ED, suffer from high similarity scores no matter 
how much the graphs differ), and 

3. satisfies the desired properties PI-PA. As discussed in the Appendix A.5 of 10, at least PI is not satisfied 
by the ED. 

^The node affinities are in [0, 1], so the square root makes them bigger. 



step 3. For interpretability, we convert the distance (d) to similarity measure (sim) via the formula 
sim = jq:^. The result is bounded to the interval [0,1], as opposed to being unbounded [0,oo). Notice that 
the distance-to-similarity transformation does not change the ranking of results in a nearest-neighbor query. 

The straightforward algorithm, DeltaCoNq (Algorithm [Tji, is to compute all the n^ affinity scores of matrix 



S by simply using equation ( 2.2 1. We can do the inversion using the Power Method or any other efficient method. 



Algorithm 1 DeltaConq 



INPUT: edge files of Gi (V, £:i) and G2(V, £:2) 

// V = Vi U V2, if Vi and V2 are the graphs' node sets 

Si = [I + e^Di — eAi]~^ // si^ij-. affinity/influence of 

S2 = [I + e^D2 — eA2]^^ //node i to node j in Gi 



d(Gi,G2)=R00TED (81,82) 
return sim{Gi,G2) = 1x3 



3.2 Scalability Analysis DeltaCoNq satisfies all the properties in Section |2] but it is quadratic (n^ affinity 
scores Sij - using power method for the inversion of sparse matrix) and thus not scalable. We present a faster, 
linear algorithm, DeltaCon (Algorithm [2]|, which approximates DeltaCoNq and differs in the first step. We 
still want each node to become a seed exactly once in order to find the affinities of the rest of the nodes to it; but, 
here, we have multiple seeds at once, instead of having one seed at a time. The idea is to randomly divide our 
node-set into g groups, and compute the affinity score of each node i to group k, thus requiring only nx g scores, 
which are stored in the n x g matrix 8' (g <^ n). Intuitively, instead of using the nx n affinity matrix 8, we add 
up the scores of the columns that correspond to the nodes of a group, and obtain the n x g matrix 8' {g <^ n). 
The score s^^ is the affinity of node i to the k^^ group of nodes (k = 1, . . . ,g). 

Lemma 3.1. The time complexity of computing the reduced affinity matrix, 8', is linear on the number of edges. 

Proof. We can compute the n x g "skinny" matrix 8' quickly, by solving [I + e^D — eA]8' = [sqi . . . sq^], 
where so^ = ^jggroup ^« ^^ ^^^ membership n x 1 vector for group k (all O's, except I's for members of the 
group). D 

Thus, we compute g final scores per node, which denote its affinity to every group of seeds, instead of every seed 



node that we had in eq. (2.2 1. With careful implementation, DeltaCon is linear on the number of number of 
edges and groups g. As we show in section |4T2| it takes ~ 160sec, on commodity hardware, for a 1.6-million-node 
graph. 

Once we have the reduced affinity matrices 8'^^ and 83 of the two graphs, we use the ROOTED, to find the 
similarity between the n x g matrices of final scores, where g <^ n. 

Lemma 3.2. The time complexity of DeltaCON, when applied in parallel to the input graphs, is linear on the 
number of edges in the graphs, i.e. 0{g ■ max{mi, 1112}). 

Proof. Based on lemma [3?T] See Appendix [A.3| in ll6l. D 

Theorem 1. DeltaCon '5 similarity score between any two graphs Gi, G2 upper bounds the actual 
DeltaCoNq's similarity score, i.e. simoc-olCi, G2) < simDc{Gi,G2). 

Proof. Intuitively, grouping nodes blurs the influence information and makes the nodes seem more similar than 



originally. For more details, see Appendix |A.3 of |i61. D 



In the following section we show that DeltaCon (which includes DeltaCoNq as a special case for g = n) 
satisfies the axioms and properties, while in the Appendix ( A.4 and A.5 in ||6l) we provide the proofs. 



Algorithm 2 DeltaCon 



INPUT: edge files of Gi(V, £"1) and G2(V, £2) and 
g (groups: # of node partitions) 

{VjYj^i = random_partition(V, g) 

II estimate affinity vector of nodes i = 1 , . . . , n to group k 

for A: = 1 — ;> 5 do 

sok = z^ieVfc ^« 
solve [I + e^Di 
solve [I + e^D2 
end for 

^1 — y^ll ^12 • • • •'IgJ' ^2 — [■'21 '=22 

// compare affinity matrices S'^ and S2 
d{Gi,G2) =RootED (S'i,S'2) 
return sim{Gi,G2) = j^ 



eA2]^2fc 

S'o 



SOk 

shk 



Wo 



'2gJ 



llg groups 



Table 2: Real and Synthetic Datasets 



Name 


Nodes 


Edges 


Description 


Brain Graplis 


70 


800-1,208 


connectome 


Enron Email |7| 


36,692 


367,662 


who-emails-whom 


Epinions 1 8 1 


131,828 


841,372 


who-trusts-whom 


Email EU|9| 


265,214 


420,045 


who-sent-to-whom 


Web Google 1 101 


875,714 


5,105,039 


site-to-site 


AS skitter 191 


1,696,415 


11,095,298 


p2p links 


Kronecker 1 


6,561 


65,536 


synthetic 


Kronecker 2 


19,683 


262,144 


syntlietic 


Kronecker 3 


59,049 


1,048,576 


syntlietic 


Kronecker 4 


177,147 


4,194,304 


syntlietic 


Kronecker 5 


531,441 


16,777,216 


synthetic 


Kronecker 6 


1,594,323 


67,108,864 


synthetic 



4 Experiments 

We conduct several experiments on synthetic and real data (undirected, unweighted graphs, unless stated 
otherwise - see Table |2]l to answer the following questions: 

Ql. Does DeltaCon agree with our intuition and satisfy the axioms/properties? Where do other methods fail? 
Q2. Is DeltaCon scalable? 

The implementation is in Matlab and we ran the experiments on AMD Opteron Processor 854 @3GHz, RAM 
32GB. 



4.1 Intuitiveness of DeltaCON. To answer Ql, for the first 3 properties (P1-P3), we conduct experiments 
on small graphs of 5 to 100 nodes and classic topologies (cliques, stars, circles, paths, barbell and wheel-barbell 
graphs, and "lollipops" shown in Fig.|2]l, since people can argue about their similarities. For the name conventions 
see Table 3. For our method we used 5 groups (g), but the results are similar for other choices of the parameter. 
In addition to the synthetic graphs, for the last property (P4), we use real networks with up to 11 million edges 
(Tabled. 

We compare our method, DeltaCon, to the 6 best state-of-the-art similarity measures that apply to our 




S5 mS5 



Figure 2: Small synthetic graphs - K: clique, C: cycle, P: path, S: star, B: 
barbell, L: lollipop, WhB: wheel-barbell 



Symbol 


Meaning 


K„ 


clique of size n 


Pn 


path of size n 


c„ 


cycle of size n 


s„ 


star of size n 


L„ 


lollipop of size n 


B„ 


barbell of size n 


WhB„ 


wheel barbell of size n 


mx 


missing X edges 


mmx 


missing X "bridge" edges 


w 


weight of "bridge" edge 



Table 3: Name Conventions for small 
synthetic graphs. Missing number after 
the prefix implied X = 1. 



setting: 

1. Vertex/Edge Overlap (VEO) 15]: For two graphs Gi(Vi, <?i) and G2(V2, <?2): 

2. Graph Edit Distance (GED) [4] : GED has quadratic complexity in general, so they [4J consider the case 



where only insertions and deletions are allowed. 

simGED{Gi,G2) 



+ 



|Vi| + |V2|-2|VinV2| 
|£:i| + |£:2| -2|£:in£:2|. 



For Vi = V2 and unweighted graphs, simcED is equivalent to hamming distance(Ai, A2) = 
sum{Ai XOR A2). 

3. Signature Similarity (SS) Q: This is the best performing similarity measure studied in Q. It is based on 
the SimHash algorithm (random projection based method). 

4. The last 3 methods are variations of the well-studied spectral method "A-distance" (lH, lITTI . (T2\ ). Let 
{Aii}[j^j^ and {A2i}l=i be the eigenvalues of the matrices that represent Gi and G2. Then, A-distance is 
given by 



dx{Gi,G2) 



\ 



7 ^ (Aij — A2i)^ 



i=l 



where k is ?7iax(|Vi|, IV2I) (padding is required for the smallest vector of eigenvalues). The variations 
of the method are based on three different matrix representations of the graphs: adjacency (A-d Adj.), 
laplacian (A-d Lap.) and normalized laplacian matrix (A-d N.L.). 



Table 4: "Edge Importance" (PI). Highlighted entries violate Table 5: "Weight Awareness" 
PI. violate P2. 



(P2). Highlighted entries 



Graphs 


DC„ 


DC 


VEO 


s.s 


GED 
(XOR) 


A-D 
ADJ. 


A-D 

Lap. 


A-D 
N.L. 


A 


B 


C 


A* = ,s;m(.l.B)- 


!im(..l.G) 


Ad = ,l(AX'] d[A.B) 


BIO 


mBlO 


mmBlO 


0.07 


0.04 


-10-' 





0.21 


-0.27 2.14 


LIO 


mLlO 


mmLlO 


0.04 


0.02 


lo-'- 





-0.30 


-0.43 -8.23 


WhBlO 


mWhBlO 


mmWhBIO 


0.03 


0.01 


-10-'' 





0.22 


0.18 -0.41 


WhBlO 


m2WhB10 


mm2WliB10 


0.07 


0.04 


-lO-'' 





0.59 


0.41 1 0.87 



Graphs 


DCo 


DC 


VEO 


ss 


GED 
(XOR) 


A-D 
Adj. 


A-D 
Lap. 


A-D 
N.L. 


A 


B 


C 


D 


As = sim{A, B) - aim{C, D) 


l^d = d(C,D)-d{A,B) 


K5 


mK5 


C5 


mC5 


0.03 


0.03 


0.02 


10-° 
-10-= 






-0.24 
-0.55 


-0.59 
-0.39 


-7.77 
-0.20 


C5 


mC5 


P5 


raP5 


0.03 


0.01 


0.01 


P5 


mP5 


S5 


mS5 


0.003 


0.001 





-10-5 





-0.07 


0.39 


3.64 


Kim 


mKim 


Cm 


mCn,. 


0.03 


0.02 


0.002 


10-' 





-1.16 


-1.69 


-311 


Cim 


mC.,» 


P,«i 


mP,.„ 


10-" 


0.01 


10--' 


-10-' 





-0.08 


-0.06 


-0.08 


PiDO 


mP,«, 


S,«i 


mSion 


0.05 


0.03 











-0.08 


1.16 


196 


k™ 


mlOKuM, 


c 


mlOC 


0.10 


0.08 


0.02 


10-- 





-3.48 


-4.52 


-1089 


Cm 


mlOCj. 


l':.:i 


niHll',., 


0.001 


0.001 


10-^ 








-0.03 


0.01 


0.31 


PlOO 


mlOPi.. 


Si,,;, 


niKB 


0.13 


0.07 





-10-= 





-0.18 


8.22 


1873 



Graphs 


DC,, 


DC 


VEO 


ss 


GED 
(XOR) 


A-D 
ADJ. 


A-D 
LAP. 


A-D 

N.L. 


A 


B 


C 


D 


A.v = Hiut{A,B) -aiiniCD) 


\d = d{C.D) -d{A.B) 


BIO 


iiiBH) 


BIO 


u.-iBlO 


0.09 


0.08 


-0.02 


lO-'' 


-1 


3.67 


5.61 


84.44 


nmiBlO 


BIO 


iiiiiiBlO 


W5B10 


0.10 


0.10 





10-' 





4.57 


7.60 


95.61 


BIO 


mBlO 


W5B10 


W2B10 


0.06 


0.06 


-0.02 


Id-'' 


-1 


2.55 


3.77 


66.71 


W5B10 


W2B10 


W5B10 


mmBlO 


0.10 


0.07 


0.02 


10-'' 


1 


2.23 


3.55 


31.04 


W5B10 


W2B10 


W5B10 


BIO 


0.03 


0.02 





10-'' 





1.12 


1.84 


17.73 



Table 6: "Edge-Submodularity" (P3). Highlighted entries 
violate P3. 



Figure 3: "Focus-Awareness" (P4). 

DeltaCon Scores for Random VS Targeted Changes DA nhavpH 

I Targeted change hurts more. I \^^' "' 



Level of corruption 
of baseline graph 



.■^■\ 



..-•-'"^targeted -- random) 




y£0 similarity scores 
along this line 



P4 Violated 



■»Email EU 
■ Epinions 
■"■Web Google 
* AS Skitter 



D.9 



1 



DeltaCon sim. score for targeted cfianges 



Tables 4-6/Figure 3: DeltaCoNq and DeltaCon (in bold) obey all the required properties (P1-P4). Tables 4-6: Each 
row of the tables corresponds to a comparison between the similarities (or distances) of two pairs of graphs; pairs (A,B) 
and (A,C) for (PI); and pairs (A,B) and (C,D) for (P2) and (P3): Non-positive values of As = sim(A,B) - sim{C,D) and 
Ad = d{C, D) - d{A, B) - depending on whether the corresponding method computes similarity or distance - are highlighted 
and mean violation of the property of interest. Figure 3: Targeted changes hurt more than random. Plot of DeltaCon 
similarity scores for random changes (y axis) vs. DeltaCon similarity scores for targeted changes (x axis) for 4 real-world 
networks. For each graph we create 8 "corrupted" versions with 10% to 80% fewer edges than the initial graphs. Notice 
that all the points are above the diagonal. 



The results for the first 3 properties are presented in the form of tables 4][6 For property PI we compare 
the graphs (A,B) and (A,C) and report the difference between the pairwise similarities/distances of our proposed 
methods and the 6 state-of-the-art methods. We have arranged the pairs of graphs in such way that (A,B) are 
more similar than (A,C). Therefore, table entries that are non-positive mean that the corresponding method does 
not satisfy the property. Similarly, for properties P2 and P3, we compare the graphs (A,B) and (C,D) and report 
the difference in their pairwise similarity/distance scores. 

PI. Edge Importance : "Edges whose removal creates disconnected components are more important than 
other edges whose absence does not affect the graph connectivity. The more important an edge is, the more it 
should affect the similarity or distance measure." 

For this experiment we use the barbell, "wheel barbell" and "lollipop" graphs, since it is easy to argue 
about the importance of the individual edges. The idea is that edges in a highly connected component (e.g. 
clique, wheel) are not very important from the information flow viewpoint, while edges that connect (almost 
uniquely) dense components play a significant role in the connectivity of the graph and the information flow. 
The importance of the "bridge" edge depends on the size of the components that it connects; the bigger the 
components the more important is the role of the edge. 

Observation 1. Only DeltaCon succeeds in distinguishing the importance of the edges (PI) w.r.t. connec- 
tivity, while all the other methods fail at least once (Table^. 

P2. Weight Awareness : "The absence of an edge of big weight is more important than the absence of a 
smaller weighted edge; this should be reflected in the similarity measure." 



The weight of an edge defines the strength of the connection between two nodes, and, in this sense, can 
be viewed as a feature that relates to the importance of the edge in the graph. For this property, we study the 
weighted versions of the barbell graph, where we assume that all the edges except the "bridge" have unit weight. 

Observation 2. All the methods are weight-aware {P2), except VEO and GED which compute just the overlap 
in edges and vertices between the graphs (Table^. 

P3. "Edge-Submodularity" : "Let A{V,8i) and B{V,£2) be two graphs with the same node set, and 
l^il > 1^2! edges. Also, assume that mxA{V,£i) and mxB(y,£2) ci^e the respective derived graphs after 
removing x edges. We expect that sim{A, rtixA) > sim{B, rUxB), since the fewer the edges in a constant-sized 
graph, the more "important" they are." 

The results for different graph topologies and 1 or 10 removed edges (prefixes 'm' and 'mlO' respectively) 
are given compactly in Table [6] Recall that non-positive values denote violation of the "edge-submodularity" 
property. 

Observation 3. On/y DeltaC on com/? fe^ to the "edge-submodularity" property (-P3) in all cases examined. 

P4. Focus Awareness : At this point, all the competing methods have failed in satisfying at least one of the 
desired properties. To test whether DeltaCon is able to distinguish the extent of a change in a graph, we analyze 
real datasets with up to 11 million edges (Table [2]l for two different types of changes. For each graph we create 
corrupted instances by removing: (i) edges from the original graph randomly, and (ii) the same number of edges 
in a targeted way (we randomly choose nodes and remove all their edges, until we have removed the appropriate 
fraction of edges). 

In Fig. [3j for each of the 4 real networks -Email EU, Enron, Google web and AS Skitter-, we give the pair 
(sim_DELTAC0N random, sim_DELTACON targeted) for each of the different levels of corruption (10%, 20%, 
. . . , 80%). That is, for each corruption level and network, there is a point with coordinates the similarity score 
between the original graph and the corrupted graph when the edge removal is random, and the score when the 
edge removal is targeted. The line y = x corresponds to equal similarity scores for both ways of removing edges. 



Observation 4. • "Targeted changes hurt more." DeltaCon is focus-aware (-P4). Removal of edges in a 
targeted way leads to smaller similarity of the derived graph to the original one than removal of the same 
number of edges in a random way. 
• "More changes: random ~ targeted." As the corruption level increases, the similarity score for random 
changes tends to the similarity score for targeted changes (in Fig. p] all lines converge to the y = x line 
for greater level of corruption). 

This is expected as the random and targeted edge removal tend to be equivalent when a significant fraction of 
edges is deleted. 

General Remarks. All in all, the baseline methods have several non-desirable properties. The spectral 
methods, as well as SS fail to comply to the "edge importance" (PI) and "edge submodularity" (P3) properties. 
Moreover, A-distance has high computational cost when the whole graph spectrum is computed, cannot 
distinguish the differences between co-spectral graphs, and sometimes small changes lead to big differences in 
the graph spectra. As far as VEO and GED are concerned, they are oblivious on significant structural properties 
of the graphs; thus, despite their straightforwardness and fast computation, they fail to discern various changes 
in the graphs. On the other hand, DeltaCon gives tangible similarity scores and conforms to all the desired 
properties. 



4.2 Scalability of DeltaCON. In Section |2] we demonstrated that DeltaCon is linear on the number of 
edges, and here we show that this also holds in practice. We ran DeltaCon on Kronecker graphs (Table [2]), 
which are known |[T3ll to share many properties with real graphs. 

Observation 5. As shown in Fig. [?] DeltaCon scales linearly with the number of edges in the graph. 

Notice that the algorithm can be trivially parallelized by finding the node affinity scores of the two graphs in 
parallel instead of sequential. Moreover, for each graph the computation of the similarity scores of the nodes 
to each of the g groups can be parallelized. However, the runtime of our experiments refer to the sequential 
implementation. 
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Figure 4: DeltaCon is linear on the number of edges (time in sec. vs. number of edges). The exact number of edges is 
annotated. 



5 DeltaCon at Work 

In this section we present two applications of graph similarity measures; we use DeltaCon and report our 
findings. 

5.1 Enron. First, we analyze the time-evolving ENRON graph. Figure |5] depicts the similarity scores between 
consecutive daily who-emailed-whom graphs. By applying Quality Control with Individual Moving Range, we 
obtain the lower and upper limits of the in-control similarity scores. These limits correspond to median ±3(7 (The 
median is used instead of the mean, since appropriate hypothesis tests demonstrate that the data does not follow 
the normal distribution. Moving range mean is used to estimate g). Using this method we were able to define the 
threshold (lower control limit) below which the corresponding days are anomalous, i.e. they differ "too much" 
from the previous and following days. Note that all the anomalous days relate to crucial events in the company's 
history during 2001 (points marked with red boxes in Fig.|5]l: (2) 8/21, Lay emails all employees stating he wants 
"to restore investor confidence in Enron."; (3) 9/26, Lay tells employees that the accounting practices are "legal 
and totally appropriate", and that the stock is "an incredible bargain."; (4) 10/5, Just before Arthur Andersen 
hired Davis Polk & Wardwell law firm to prepare a defense for the company; (5) 10/24-25, Jeff McMahon takes 
over as CEO. Email to all employees states that all the pertinent documents should be preserved; (6) 11/8, Enron 
announces it overstated profits by 586 million dollars over 5 years. 

Although high similarities between consecutive days do not consist anomalies, we found that mostly 
weekends expose high similarities. Eor instance, the first two points of 100% similarity correspond to the 
weekend before Christmas in 2000 and a weekend in July, when only two employees sent emails to each other. 
It is noticeable that after February 2002 many consecutive days are very similar; this happens because, after the 
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Figure 5: Graph Anomaly Detection with DeltaCon. The marked days correspond to anomalies and coincide with major 
events in the history of Enron. The blue points are similarity scores between consecutive instances of the daily email activity 
between the employees, and the marked days are 3cr units away from the median similarity score. 



collapse of Enron, the email exchange activity was rather low and often between certain employees. 

5.2 Brain Connectivity Graph Clustering. We also use DeltaCon for clustering and classification. For this 
purpose we study conectomes -brain graphs-, which are obtained by Multimodal Magnetic Resonance Imaging 

m. 

In total we study the connectomes of 1 14 people; each consists of 70 cortical regions (nodes), and connections 
(weighted edges) between them. We ignore the strength of connections and derive an undirected, unweighted 
brain graph per person. In addition to the connectomes, we have attributes for each person (e.g., age, gender, IQ). 

We first get the DeltaCon pairwise similarities between the brain graphs, and then perform hierarchical 



clustering using Ward's method (Fig. 1(b) i. As shown in the figure, there are two clearly separable groups of 



brain graphs. Applying t-test on the available attributes for the two groups created by the clusters, we found 
that the latter differ significantly (p-value=0.0057) in the Composite Creativity Index (CCI), which is related to 
the person's performance on a series of creativity tasks. Moreover, the two groups correspond to low and high 
openness index (p-value=0.0558), one of the "Big Five Factors"; that is, the brain connectivity is different in 
people that are inventive and people that are consistent. Exploiting analysis of variance (ANOVA: generalization 
of t-test when more than 2 groups are analyzed), we tested whether the various clusters that we obtain from 
hierarchical clustering reflect the structural differences in the brain graphs. However, in the dataset we studied 
there is no sufficient statistical evidence that age, gender, IQ etc. are related to the brain connectivity. 

6 Related Work 

Graph Similarity. The problems are divided in two main categories: (1) With Known Node Correspondence. 
Papadimitriou et al. |5] propose 5 similarity measures for directed web graphs. Among them the best is the 
Signature Similarity (SS), which is based on the SimHash algorithm, while the Vertex/Edge Overlap similarity 
(VEO) performs very well. Bunke |4| presents techniques used to track sudden changes in communications 
networks for performance monitoring. The best approaches are the Graph Edit Distance and Maximum Common 
Subgraph. Both are NP-complete, but the former approach can be simplified given the application and it becomes 
linear on the number of nodes and edges in the graphs. (2) With Unknown Node Correspondence. Two approaches 
can be used: (a) feature extraction and similarity computation, (b) graph matching and application of techniques 
from the first category [.15.1 . (c) graph kernels 1.16.1 . The research directions in this category include: A-distance 
(ll4ll. |[Tn . lfT2l ). a spectral method that has been studied thoroughly; algebraic connectivity ifTTl ; an SVM-based 



approach on global feature vectors lITSl ; social networks similarity |[T9l ; computing edge curvatures under heat 
kernel embedding |[20l ; comparison of the number of spanning trees ETll ; fast random walk graph kernel 1221 ■ 

Both research directions are important, but apply in different settings; if the node correspondence is available, 
the algorithms that make use of it can perform only better than methods that omit it. Here we tackle the former 
problem. 

Node affinity algorithms. There are numerous node affinity algorithms; Pagerank ||23]| . Personalized 
Random Walks with Restarts [24], the electric network analogy [251, SimRank f26|, and Belief Propagation f 27l 
are only some examples of the most successful techniques. Here we focus on the latter method, and specifically 
a fast variation [281 which is also intuitive. All the techniques have been used successfully in many tasks, such 
as ranking, classification, malware and fraud detection ( ||29]| . ll30ll ). and recommendation systems OTl . 



7 Conclusions 

In this work, we tackle the problem of graph similarity when the node correspondence is known (e.g., similarity 
in time-evolving phone networks). Our contributions are: 

• Axioms/Properties: we formalize the problem of graph similarity by providing axioms, and desired 
properties. 

• Algorithm: We propose DeltaCon, an algorithm that is (a) principled (axioms A1-A3, in Sec. [2]), (b) 
intuitive (properties P1-P4, in Sec.|4]l, and (c) scalable, needing on commodity hardware ~160 seconds for 
a graph with over 67 million edges. 

• Experiments: We evaluate the intuitiveness of DeltaCon, and compare it to 6 state-of-the-art measures. 

• Applications: We use DeltaCon for temporal anomaly detection (ENRON), and clustering & classifica- 
tion (brain graphs). 

Future work includes parallelizing our algorithm, as well as trying to partition the graphs in a more informative 
way (e.g., using elimination tree) than random. 
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A Appendix 

A.l From Fast Belief Propagation (FaBP) to DeltaCon FaBP (Ell) is a fast approximation of the loopy 
BP algorithm, which is guaranteed to converge and is described by the linear equation: [I + oD — c'A]s = sq, 



where sq is the vector of prior scores, s is the vector of final scores (beliefs), a = 4/i^/(l — 4/i^), and 
c' = 2/i/j/(l — 4/i|) are small constants, and h^ is a constant that encodes the influence between neighboring 
nodes. By (a) using the MacLaurin expansion for 1/(1 — 4/i^) and omitting the terms of power greater than 2, 



(b) setting sq = Cj and s = Si, and (c) setting h^ = e/2, we obtain eq. (2.1 1, the core formula of DeltaCon. 
A.2 Connection between FaBP and Personalized RWR 



Theorem 2. The FaBP equation {2.1) can be written in the Personalized RWR-like form: 

[I-{l-c")A,-D-^]si = c"y, 
where d' = 1 - e, y = A^J)- '^A-'^-^ei and A^ = J:>{I + e^By^B^AB. 



Proof. We begin from the derived FaBP equation (2.1 1 and do simple linear algebra operations: 



[I + e^D - e A] Si = Ci ( x D" ^ from the left) 

[D-i + e^I - eD-i A]si = D-^Ci (F = B^ + e^I) 

[F - eD-i A]s; = B-^Si (x F'^ from the left) 

[I - eF'^B-'^A]si = F-iD-^ei ( A^ = F-^D-^AD) 

[I - eA,B-^]si = (1 - e) {A.B'^ A'^ j^^ei) D 

A.3 Proofs for Section |3] 

Lemma A. 1. The time complexity of DeltaCON is linear on the number of edges in the graphs, i.e. 

0{g ■ max{7ni,m2}). 



Proof. [Proof of Lemma [3^ By using the Power Method |28|, the complexity of solving eq. (2.1 1 is 0(|£"i|) 
for each graph (i = 1,2). The node partitioning needs 0(n) time; the affinity algorithm is run g times in 
each graph, and the similarity score is computed in 0{gn) time. Therefore, the complexity of DeltaCon is 
0{{g + l)n + g{mi + 1112)), where 5 is a small constant. Unless the graphs are trees, \Ei\ < n, so the complexity 
of the algorithm reduces to 0{g{7ni + m2))- Assuming that the affinity algorithm is run on the graphs in parallel, 
since there is no dependency between the computations, DeltaCon has complexity 0{g ■ max{mi,m2}). □ 

Lemma A. 2. The affinity score of each node to a group (computed by DeltaCONJ is equal to the sum of the 
affinity scores of the node to each one of the nodes in the group individually (computed by DeltaCoNqJ. 

Proof. Let B = I + e^D — eA. Then DeltaCoNq consists of solving for every node i € V the equation 
Bsj = Ci, DeltaCon solves the equation Bs*^ = sqj. for all groups k G (0,5], where sq^ = Yliieqroup ^'• 
Because of the linearity of matrix additions, it holds true that s*^ = J2ieqroup ^«' ^^^ ^^^ groups k. D 

Theorem 3. DeltaCon 'i' similarity score between any two graphs Gi, G2 upper bounds the actual 
DeltaCoNq's similarity score, i.e. simDC-o{Gi,G2) < simDc{Gi,G2)- 

Proof. [Proof of Theorem[T| Let Si , S2 be the n x n final-scores matrices of Gi and G2 by applying DeltaCoNq, 
and S'j^jSg be the respective n x g final-scores matrices by applying DeltaCon. We want to show that 
DeltaConq's distance 



is greater than DeltaCon's distance 



doc = wELi Er=i (aAU " \Ai) 



^2 



or, equivalently, that d'j^fj > d^c*. It is sufficient to show that for one group of DeltaCon, the coiTesponding 
summands in d£)c are smaller than the summands in doco "^hat are related to the nodes that belong to the group. 
By extracting the terms in the squared distances that refer to one group of DeltaCon and its member nodes in 
DeltaCoNq, and by applying Lemma [A!2| we obtain the following terms: 



ioCo ~ l^i=l Z-ijegroupW^^'^J \/^2,jjj 



2 



^DC — 2^i=l\\ l^j(zqroup^'^,ii \/ /-ijeqroup ^'2,ij) ■ 



jegroup "^ijij Y /-^j£group '^^.«J 

Next we concentrate again on a selection of summands (e.g. i = 1), we expand the squares and use the Cauchy- 
Schwartz inequality to show that 



l^jegroup \r^hij^^Aj ^ y /->j£ group ^'^,ij l^jGgroup ^2,ij; 

or equivalently that tnco > ^dc- C 

A.4 Satisfaction of the Axioms Here we elaborate on the satisfiability of the axioms by DeltaConq and 
DeltaCon. 

Al. Identity Property: sim(Gi,Gi) = 1. 

The proof is straightforward; the affinity scores are identical for both graphs. 

A2. Symmetric Property: sim{Gi,G2) = sim{G2,Gi). 

The proof is straightforward for DeltaConq. For the randomized algorithm, DeltaCon, it can be shown 
that sim{Gi,G2) = sim{G2, Gi) on average. 

A3. Zero Property: sim{Gi, G2) — )■ forn — ;■ 00, where Gi is the clique graph (Kn), and G2 is the empty 
graph (i.e., the edge sets are complementary). 

We restrict ourselves to a sketch of proof, since the proof is rather intricate. 

Proof. [(Sketch of Proof - Zero Property)] First we show that all the nodes in a clique get final scores in { s^ , Sng } , 
depending on whether they are included in group g or not. Then, it can be demonstrated that the scores have finite 
limits, and specifically {sg, Sng} — )■ {|^ + 1, ^} as n — ;• 00 (for finite -). Given this condition, it can be directly 
derived that the ROOTED between the S matrices of the empty graph and the clique becomes arbitrarily large. 

So, sim{Gi, G2) — )• for n — )• 00. D 

A.5 Satisfaction of the Properties Here we give some theoretical guarantees for the most important property, 
"edge importance" (PI). We prove the satisfiability of the property in a special case; generalizing this proof, as 
well as the proofs of the rest properties is theoretically interesting and remains future work. 

Special Case [Barbell graph]: Assume A is a barbell graph with ni and n2 nodes in each clique (e.g., 510 
with ni = 71-2 = 5 in Fig.[2]l, B does not have one edge in the first clique, and C does not have the "bridge" edge. 



Proof. From eq. (2.1 1, by using the Power method we obtain the solution: 

Si = [l + (eA - e^D) + (eA - e^D)^ + ...]e, => 

Si^[I + eA + e\A^-n)]ei, 

where we ignore the terms of greater than second power. By writing out the elements of the Sa, S^, Sc matrices 
as computed by DeltaCoNq and the above formula, and also the ROOTED between graphs A, B and A, C, we 
obtain the following formula for their relevant difference: 

d{A, Cf - d{A, Bf = 2e{e(n - /) + 1 - (?£!(!^1^ + A)}, 



where ci = y^e + e^(ni — 3) + y^e + e'^{ni — 2) and C2 = Y^e2(ni — 2) + y^e + e2(ni — 2). We can show that 
ci > 2y^ for hfi < I (which holds always) and C2 > ^/e, where / = 3 if the missing edge in graph B is adjacent 
to the "bridge" node, and / = 2 in any other case. So, d{A, Cf - d{A, Bf' > 0. 

This property is not always satisfied by the euclidean distance. D 



